AITopics | Greater Upper Nile

Collaborating Authors

Greater Upper Nile

EthioMT: Parallel Corpus for Low-resource Ethiopian Languages

Tonja, Atnafu Lambebo, Kolesnikova, Olga, Gelbukh, Alexander, Kalita, Jugal

arXiv.org Artificial IntelligenceMar-28-2024

Recent research in natural language processing (NLP) has achieved impressive performance in tasks such as machine translation (MT), news classification, and question-answering in high-resource languages. However, the performance of MT leaves much to be desired for low-resource languages. This is due to the smaller size of available parallel corpora in these languages, if such corpora are available at all. NLP in Ethiopian languages suffers from the same issues due to the unavailability of publicly accessible datasets for NLP tasks, including MT. To help the research community and foster research for Ethiopian languages, we introduce EthioMT -- a new parallel corpus for 15 languages. We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia. We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2403.19365

Country:

Africa > Middle East (0.47)
Africa > Ethiopia (0.41)
North America > United States > Colorado (0.15)
Africa > South Sudan > Greater Upper Nile > Greater Pibor Administrative Area (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery

Zaytar, Akram, Robinson, Caleb, Hacheme, Gilles Q., Tadesse, Girmaw A., Dodhia, Rahul, Ferres, Juan M. Lavista, Hughey, Lacey F., Stabach, Jared A., Amoke, Irene

arXiv.org Artificial IntelligenceMar-5-2024

Rare object detection is a fundamental task in applied geospatial machine learning, however is often challenging due to large amounts of high-resolution satellite or aerial imagery and few or no labeled positive samples to start with. This paper addresses the problem of bootstrapping such a rare object detection task assuming there is no labeled data and no spatial prior over the area of interest. We propose novel offline and online cluster-based approaches for sampling patches that are significantly more efficient, in terms of exposing positive samples to a human annotator, than random sampling. We apply our methods for identifying bomas, or small enclosures for herd animals, in the Serengeti Mara region of Kenya and Tanzania. We demonstrate a significant enhancement in detection efficiency, achieving a positive sampling rate increase from 2% (random) to 30%. This advancement enables effective machine learning mapping even with minimal labeling budgets, exemplified by an F1 score on the boma detection task of 0.51 with a budget of 300 total patches.

artificial intelligence, detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.02736

Country:

Africa > South Sudan > Greater Upper Nile > Greater Pibor Administrative Area > Boma (0.48)
Africa > Tanzania > Mara Region (0.25)

Genre: Research Report (0.40)

Industry:

Energy > Renewable > Solar (0.68)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback